Methods, Requirements and Licenses for Shared NLG Resources
نویسندگان
چکیده
Tools and data that can be shared in the Natural Language Generation community require common standards for data collection, documentation, implementation and licensing as well as a central place to find such resources. We argue for open, free, well-documented and simplystructured resources, and introduce a free and open online repository of NLG resources. 1 Shared tasks and shared resources Work in Natural Language Processing has come to depend on automated evaluation schemes which provide empirical measures of success. This has led to constructive competition between various groups, for instance in syntactic parsing, information extraction, and machine translation. It has also led to rapid improvements in localized problems, but not yet to large end-to-end systems. Such schemes need large quantities of common, annotated data to deduce statistical relationships between produced and desired output. They need common and reusable, thus controlled test components or tasks. Meanwhile, NLG groups have used handcrafted data to demonstrate and qualitatively evaluate their systems. The NLG community has yet to produce common, standardized datasets, although this has slowly been changing. For instance, participants at the 2005 European Workshop on NLG expressed their desire to establish shared data, consisting of structured databases with domain-specific content and “gold standard” human-written results. Shared resources could allow for a centralized and coordinated evaluation of systems performing a shared task. Similarly, new systems could be evaluated against older competing ones. In this context, the idea of reusable software tools becomes very attractive. Any new NLG module needs to interface with existing ones before it can be sensibly evaluated by humans. For instance, a new adaptive realizer for a dialogue system will need a backend to supply selected content and a user model. Unfortunately, many systems created are domain-specific demos. Here, the underlying, novel principles are meant to be resuable, but their implementations are not. In the following, we propose some requirements for sharable data which should make implementations subsequently more reusable. We announce a common directory of resources, which is already available to the community. On the downside, we will want to a avoid a situation where researchers produce code to beat the automatic score, rather than make progress on solving the important challenges. For instance, summarization shies away from systems that aggregate and paraphrase because they wouldn’t score well in the standard ROUGE metric. Automatic evaluation must be used with caution. 2 Other Fields and their Ethical and Organizational Considerations Fields differ greatly in their adoption of datasharing. Genetics researchers, for instance, adopted the idea early on, and the US-American National Institute of Health mandated it in 2003. Even without being pressured by funding bodies, we are under a moral obligation to share data, once it has been collected with the help of public funds. To address concerns about the validity of studies carried out with these data, the exact collection method should be documented, so that resulting constraints on the analysis are evident. Ethical considerations are important in any setting. Carrying out NLG data collection experiments will commonly pose little difficulties. However, we suggest that participants are asked to permit the free dissemination of the recorded data, even if a distribution is not immediately planned. Any anonymization, where necessary for privacy reasons, should be planned and agreed before-
منابع مشابه
A Repository of Data and Evaluation Resources for Natural Language Generation
Starting in 2007, the field of natural language generation (NLG) has organised shared-task evaluation events every year, under the Generation Challenges umbrella. In the course of these shared tasks, a wealth of data has been created, along with associated task definitions and evaluation regimes. In other contexts too, sharable NLG data is now being created. In this paper, we describe the onlin...
متن کاملPragmatic Influences on Sentence Planning and Surface Realization: Implications for Evaluation
Three questions to ask of a proposal for a shared evaluation task are: whether to evaluate, what to evaluate and how to evaluate. For NLG, shared evaluation resources could be a very positive development. In this statement I address two issues related to the what and how of evaluation: establishing a “big picture” evaluation framework, and evaluating generation in context.
متن کاملAutomatic Evaluation of Referring Expression Generation Is Possible
Shared evaluation metrics and tasks are now well established in many fields of Natural Language Processing. However, the Natural Language Generation (NLG) community is still lacking common methods for assessing and comparing the quality of systems. A number of issues that complicate automatic evaluation of NLG systems have been discussed in the literature. 1 The most fundamental observation in ...
متن کاملDiscussion Panel on Evaluation in Generation Research
Evaluation is critical in offering feedback on progress_toboth developers andpotential consumers of NLG technology. However, evaluation has thus far not been as well-established in NLG as it has become in NLU. This panel will discuss evaluation methods and resources. It is aimed at building a better understanding of NLG evaluation methods, and hopefully arriving at steps to facilitate future ev...
متن کاملValidating the web-based evaluation of NLG systems
The GIVE Challenge is a recent shared task in which NLG systems are evaluated over the Internet. In this paper, we validate this novel NLG evaluation methodology by comparing the Internet-based results with results we collected in a lab experiment. We find that the results delivered by both methods are consistent, but the Internetbased approach offers the statistical power necessary for more fi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006